Rough Sets and Confidence Attribute Bagging for Chinese Architectural Document Categorization
نویسندگان
چکیده
Aiming at the problems of the traditional feature selection methods that threshold filtering loses a lot of effective architectural information and the shortcoming of Bagging algorithm that weaker classifiers of Bagging have the same weights to improve the performance of Chinese architectural document categorization, a new algorithm based on Rough set and Confidence Attribute Bagging is proposed for Chinese architectural document categorization. Rough sets is used to feature selection. First the cores of attributes are found by discernibility matrix and one of the cores is regarded as the start point. Then attributes’ significance and dependency are used as the heuristic information to do feature selection. A Chinese architectural document classifier is designed by Confidence Attribute Bagging algorithm. The voting weights of weaker classifiers are gained by their result and the stronger classifier result is attained by weaker classifiers voting. The algorithm is applied in Attribute Bagging algorithm to design a classifier. The experimental results show that the novel method is not only easy to implement but can effectively reduce the dimensional space, and improve the accuracy of classification.
منابع مشابه
Keyword Reduction for Text Categorization using Neighborhood Rough Sets
Keyword reduction is a technique that removes some less important keywords from the original dataset. Its aim is to decrease the training time of a learning machine and improve the performance of text categorization. Some researchers applied rough sets, which is a popular computational intelligent tool, to reduce keywords. However, classical rough sets model, which is usually adopted, can just ...
متن کاملFuzzy-rough attribute reduction with application to web categorization
Due to the explosive growth of electronically stored information, automatic methods must be developed to aid users in maintaining and using this abundance of information e+ectively. In particular, the sheer volume of redundancy present must be dealt with, leaving only the information-rich data to be processed. This paper presents a novel approach, based on an integrated use of fuzzy and rough s...
متن کاملMultiple Sets of Rules for Text Categorization
This paper concerns how multiple sets of rules can be generated using a rough sets-based inductive learning method and how they can be combined for text categorization by using Dempster’s rule of combination. We first propose a boosting-like technique for generating multiple sets of rules based on rough set theory, and then model outcomes inferred from rules as pieces of evidence. The various e...
متن کاملA Framework for Optimal Attribute Evaluation and Selection in Hesitant Fuzzy Environment Based on Enhanced Ordered Weighted Entropy Approach for Medical Dataset
Background: In this paper, a generic hesitant fuzzy set (HFS) model for clustering various ECG beats according to weights of attributes is proposed. A comprehensive review of the electrocardiogram signal classification and segmentation methodologies indicates that algorithms which are able to effectively handle the nonstationary and uncertainty of the signals should be used for ECG analysis. Ex...
متن کاملA Comparative Study on Chinese Text Categorization Methods
This paper reports our comparative evaluation of three machine learning methods on Chinese text categorization. Whereas a wide range of methods have been applied to English text categorization, relatively few studies have been done on Chinese text categorization. Based on a re-constructed People’s Daily corpus, a series of controlled experiments evaluate three machine learning methods, namely k...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- JSW
دوره 6 شماره
صفحات -
تاریخ انتشار 2011